Methods and kits for labeling cellular molecules

ABSTRACT

Methods of uniquely labeling or barcoding molecules within a cell, a plurality of cells, and/or a tissue are provided. Kits for uniquely labeling or barcoding molecules within a cell, a plurality of cells, and/or a tissue are also provided. The molecules to be labeled may include, but are not limited to, RNAs, cDNAs, DNAs, proteins, peptides, and/or antigens.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 17/122,321, filed Dec. 15, 2020, which is a continuation of U.S. patent application Ser. No. 14/941,433, filed Nov. 13, 2015, now issued U.S. Pat. No. 10,900,065, issued on Jan. 26, 2021, which claims the benefit of U.S. Provisional Application No. 62/080,055, filed Nov. 14, 2014, each of which is hereby incorporated by reference in its entirety.

STATEMENT OF GOVERNMENT LICENSE RIGHTS

This invention was made with government support under Grant No. R01 CA207029, awarded by the National Institutes of Health, and Grant No. CCF-1317653, awarded by the National Science Foundation. The government has certain rights in the invention.

STATEMENT REGARDING SEQUENCE LISTING

The Sequence Listing XML associated with this application is provided in XML format and is hereby incorporated by reference into the specification. The name of the XML file containing the sequence listing is 3915_P1043USCON10UW_Seq_List_20230313. The XML file is 34 KB; was created on Mar. 13, 2023; and is being submitted electronically via Patent Center with the filing of the specification.

TECHNICAL FIELD

The present disclosure relates generally to methods of uniquely labeling or barcoding molecules within a cell, a plurality of cells, and/or a tissue. The present disclosure also relates to kits for uniquely labeling molecules within a cell, a plurality of cells, and/or a tissue. In particular, the methods and kits may relate to the labeling of RNAs, cDNAs, DNAs, proteins, peptides, and/or antigens.

BACKGROUND

Next Generation Sequencing (NGS) can be used to identify and/or quantify individual transcripts from a sample of cells. However, such techniques may be too complicated to perform on individual cells in large samples. In such methods, RNA transcripts are generally purified from lysed cells (i.e., cells that have been broken apart), followed by conversion of the RNA transcripts into complementary DNA (cDNA) using reverse transcription. The cDNA sequences can then be sequenced using NGS. In such a procedure, all of the cDNA sequences are mixed together before sequencing, such that RNA expression is measured for a whole sample and individual sequences cannot be linked back to an individual cell.

Methods for uniquely labeling or barcoding transcripts from individual cells can involve the manual separation of individual cells into separate reaction vessels and can require specialized equipment. An alternative approach to sequencing individual transcripts in cells is to use microscopy to identify individual fluorescent bases. However, this technique can be difficult to implement and limited to sequencing a low number of cells.

BRIEF DESCRIPTION OF THE DRAWINGS

The embodiments disclosed herein will become more fully apparent from the following description and appended claims, taken in conjunction with the accompanying drawings.

FIG. 1 depicts ligation of nucleic acid tags to form a label or barcode.

FIG. 2 is a schematic representation of the formation of cDNA by in situ reverse transcription. Panel A depicts a cell that is fixed and permeabilized. Panel B depicts addition of a poly(T) primer, which can template the reverse transcription of polyadenylated transcripts. Panel C depicts addition of a random hexamer, which can template the reverse transcription of substantially any transcript. Panel D depicts the addition of a primer that is designed to target a specific transcript such that only a subset of transcripts may be amplified. Panel E depicts the cell of Panel A after reverse transcription, illustrating a cDNA hybridized to an RNA.

FIG. 3A depicts non-templated ligation of a single-stranded adapter to an RNA fragment.

FIG. 3B depicts ligation of a single-stranded adapter using a partial duplex with random hexamer primers.

FIG. 4 depicts primer binding.

FIG. 5 depicts primer binding followed by reverse transcription.

FIG. 6 depicts DNA-tagged antibodies for use in labeling cellular proteins.

FIG. 7 depicts aptamers for use in labeling cellular proteins.

FIG. 8 is a schematic representation of the dividing, tagging, and pooling of cells, according to an embodiment of the present disclosure. As depicted, cells can be divided between a plurality of reaction vessels. One cell is highlighted to show its path through the illustrated process.

FIG. 9A depicts an exemplary workflow, according to an embodiment of the present disclosure.

FIG. 9B depicts an exemplary workflow, according to another embodiment of the present disclosure.

FIG. 10 depicts a reverse transcription primer (BC_0055), according to an embodiment of the present disclosure.

FIG. 11 depicts an annealed, first-round barcode oligo, according to an embodiment of the present disclosure.

FIG. 12 depicts an annealed, second-round barcode oligo, according to an embodiment of the present disclosure.

FIG. 13 depicts an annealed, third-round barcode oligo, according to an embodiment of the present disclosure.

FIG. 14 depicts ligation stop oligos, according to an embodiment of the present disclosure.

FIG. 15 depicts a single-stranded DNA adapter oligo (BC_0047) ligated to the 3′ end of a cDNA, according to an embodiment of the present disclosure.

FIG. 16 depicts a PCR product formed using primers BC_0051 and BC_0062 and the 3′ adapter oligo (BC_0047) after it has been ligated to barcoded cDNA.

FIG. 17 depicts BC_0027, which includes the flow cell binding sequence and the binding site for the TRUSEQ™ read 1 primer and BC_0063, which includes the flow cell binding sequence and the TruSeq multiplex read 2 and index binding sequence. FIG. 17 also illustrates a region for a sample index, which is GATCTG in this embodiment.

FIG. 18 is a scatter plot, wherein for each unique barcode combination the number of reads aligning to the human genome (x-axis) and the mouse genome (y-axis) are plotted.

DETAILED DESCRIPTION

The present disclosure relates generally to methods of uniquely labeling or barcoding molecules within a cell, a plurality of cells, and/or a tissue. The present disclosure also relates to kits for uniquely labeling or barcoding molecules within a cell, a plurality of cells, and/or a tissue. The molecules to be labeled may include, but are not limited to, RNAs, cDNAs, DNAs, proteins, peptides, and/or antigens.

It will be readily understood that the embodiments, as generally described herein, are exemplary. The following more detailed description of various embodiments is not intended to limit the scope of the present disclosure, but is merely representative of various embodiments. Moreover, the order of the steps or actions of the methods disclosed herein may be changed by those skilled in the art without departing from the scope of the present disclosure. In other words, unless a specific order of steps or actions is required for proper operation of the embodiment, the order or use of specific steps or actions may be modified.

The term “binding” is used broadly throughout this disclosure to refer to any form of attaching or coupling two or more components, entities, or objects. For example, two or more components may be bound to each other via chemical bonds, covalent bonds, ionic bonds, hydrogen bonds, electrostatic forces, Watson-Crick hybridization, etc.

A first aspect of the disclosure relates to methods of labeling nucleic acids. In some embodiments, the methods may comprise labeling nucleic acids in a first cell. The methods may comprise: (a) generating complementary DNAs (cDNAs) within a plurality of cells comprising the first cell by reverse transcribing RNAs using a reverse transcription primer comprising a 5′ overhang sequence; (b) dividing the plurality of cells into a number (n) of aliquots; (c) providing a plurality of nucleic acid tags to each of the n aliquots, wherein each labeling sequence of the plurality of nucleic acid tags provided into a given aliquot is the same, and wherein a different labeling sequence is provided into each of the n aliquots; (d) binding at least one of the cDNAs in each of the n aliquots to the nucleic acid tags; (e) combining the n aliquots; and (f) repeating steps (b), (c), (d), and (e) with the combined aliquot. In various embodiments, the plurality of cells may be selected from eukaryotic cells and prokaryotic cells. In various other embodiments, the plurality of cells may be selected from, but not limited to, at least one of mammalian cells, yeast cells, and/or bacterial cells.

In certain embodiments, each nucleic acid tag may comprise a first strand including a 3′ hybridization sequence extending from a 3′ end of a labeling sequence and a 5′ hybridization sequence extending from a 5′ end of the labeling sequence. Each nucleic acid tag may also comprise a second strand including an overhang sequence. The overhang sequence may include (i) a first portion complementary to at least one of the 5′ hybridization sequence and the 5′ overhang sequence and (ii) a second portion complementary to the 3′ hybridization sequence. In some embodiments, the nucleic acid tag (e.g., the final nucleic acid tag) may comprise a capture agent such as, but not limited to, a 5′ biotin. A cDNA labeled with a 5′ biotin-comprising nucleic acid tag may allow or permit the attachment or coupling of the cDNA to a streptavidin-coated magnetic bead. In some other embodiments, a plurality of beads may be coated with a capture strand (i.e., a nucleic acid sequence) that is configured to hybridize to a final sequence overhang of a barcode. In yet some other embodiments, cDNA may be purified or isolated by use of a commercially available kit (e.g., an RNEASY™ kit).

In various embodiments, step (f) (i.e., steps (b), (c), (d), and (e)) may be repeated a number of times sufficient to generate a unique series of labeling sequences for the cDNAs in the first cell. Stated another way, step (f) may be repeated a number of times such that the cDNAs in the first cell may have a first unique series of labeling sequences, the cDNAs in a second cell may have a second unique series of labeling sequences, the cDNAs in a third cell may have a third unique series of labeling sequences, and so on. The methods of the present disclosure may provide for the labeling of cDNA sequences from single cells with unique barcodes, wherein the unique barcodes may identify or aid in identifying the cell from which the cDNA originated. In other words, a portion, a majority, or substantially all of the cDNA from a single cell may have the same barcode, and that barcode may not be repeated in cDNA originating from one or more other cells in a sample (e.g., from a second cell, a third cell, a fourth cell, etc.).

In some embodiments, barcoded cDNA can be mixed together and sequenced (e.g., using NGS), such that data can be gathered regarding RNA expression at the level of a single cell. For example, certain embodiments of the methods of the present disclosure may be useful in assessing, analyzing, or studying the transcriptome (i.e., the different RNA species transcribed from the genome of a given cell) of one or more individual cells.

As discussed above, an aliquot or group of cells can be separated into different reaction vessels or containers and a first set of nucleic acid tags can be added to the plurality of cDNA transcripts. The aliquots of cells can then be regrouped, mixed, and separated again and a second set of nucleic acid tags can be added to the first set of nucleic acid tags. In various embodiments, the same nucleic acid tag may be added to more than one aliquot of cells in a single or given round of labeling. However, after repeated rounds of separating, tagging, and repooling, the cDNAs of each cell may be bound to a unique combination or sequence of nucleic acid tags that form a barcode. In some embodiments, cells in a single sample may be separated into a number of different reaction vessels. For example, the number of reaction vessels may include four 1.5 ml microcentrifuge tubes, a plurality of wells of a 96-well plate, or another suitable number and type of reaction vessels.

In certain embodiments, step (f) (i.e., steps (b), (c), (d), and (e)) may be repeated a number of times wherein the number of times is selected from 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, etc. In certain other embodiments, step (f) may be repeated a sufficient number of times such that the cDNAs of each cell would be likely to be bound to a unique barcode. The number of times may be selected to provide a greater than 50% likelihood, greater than 90% likelihood, greater than 95% likelihood, greater than 99% likelihood, or some other probability that the cDNAs in each cell are bound to a unique barcode. In yet other embodiments, step (f) may be repeated some other suitable number of times.

In some embodiments, the methods of labeling nucleic acids in the first cell may comprise fixing the plurality of cells prior to step (a). For example, components of a cell may be fixed or cross-linked such that the components are immobilized or held in place. The plurality of cells may be fixed using formaldehyde in phosphate buffered saline (PBS). The plurality of cells may be fixed, for example, in about 4% formaldehyde in PBS. In various embodiments, the plurality of cells may be fixed using methanol (e.g., 100% methanol) at about −20° C. or at about 25° C. In various other embodiments, the plurality of cells may be fixed using methanol (e.g., 100% methanol), at between about −20° C. and about 25° C. In yet various other embodiments, the plurality of cells may be fixed using ethanol (e.g., about 70-100% ethanol) at about −20° C. or at room temperature. In yet various other embodiments, the plurality of cells may be fixed using ethanol (e.g., about 70-100% ethanol) at between about −20° C. and room temperature. In still various other embodiments, the plurality of cells may be fixed using acetic acid, for example, at about −20° C. In still various other embodiments, the plurality of cells may be fixed using acetone, for example, at about −20° C. Other suitable methods of fixing the plurality of cells are also within the scope of this disclosure.

In certain embodiments, the methods of labeling nucleic acids in the first cell may comprise permeabilizing the plurality of cells prior to step (a). For example, holes or openings may be formed in outer membranes of the plurality of cells. TRITON™ X-100 may be added to the plurality of cells, followed by the addition of HCl to form the one or more holes. About 0.2% TRITON™ X-100 may be added to the plurality of cells, for example, followed by the addition of about 0.1 N HCl. In certain other embodiments, the plurality of cells may be permeabilized using ethanol (e.g., about 70% ethanol), methanol (e.g., about 100% methanol), Tween 20 (e.g., about 0.2% Tween 20), and/or NP-40 (e.g., about 0.1% NP-40). In various embodiments, the methods of labeling nucleic acids in the first cell may comprise fixing and permeabilizing the plurality of cells prior to step (a).

In some embodiments, the cells may be adherent cells (e.g., adherent mammalian cells). Fixing, permeabilizing, and/or reverse transcription may be conducted or performed on adherent cells (e.g., on cells that are adhered to a plate). For example, adherent cells may be fixed, permeabilized, and/or undergo reverse transcription followed by trypsinization to detach the cells from a surface. Alternatively, the adherent cells may be detached prior to the separation and/or tagging steps. In some other embodiments, the adherent cells may be trypsinized prior to the fixing and/or permeabilizing steps.

In some embodiments, the methods of labeling nucleic acids in the first cell may comprise ligating at least two of the nucleic acid tags that are bound to the cDNAs. Ligation may be conducted before or after the lysing and/or the cDNA purification steps. Ligation can comprise covalently linking the 5′ phosphate sequences on the nucleic acid tags to the 3′ end of an adjacent strand or nucleic acid tag such that individual tags are formed into a continuous, or substantially continuous, barcode sequence that is bound to the 3′ end of the cDNA sequence. In various embodiments, a double-stranded DNA or RNA ligase may be used with an additional linker strand that is configured to hold a nucleic acid tag together with an adjacent nucleic acid in a “nicked” double-stranded conformation. The double-stranded DNA or RNA ligase can then be used to seal the “nick.” In various other embodiments, a single-stranded DNA or RNA ligase may be used without an additional linker. In certain embodiments, the ligation may be performed within the plurality of cells

FIG. 1 illustrates ligation of a plurality of nucleic acid tags to form a substantially continuous label or barcode. For example, after a plurality of nucleic acid tag additions, each cDNA transcript may be bound or linked to series of nucleic acid tags. Use of a ligase may ligate or covalently link a portion of the nucleic acid tags to form a substantially continuous label or barcode that is bound or attached to a cDNA transcript.

In certain other embodiments, the methods may comprise lysing the plurality of cells (i.e., breaking down the cell structure) to release the cDNAs from within the plurality of cells, for example, after step (f). In some embodiments, the plurality of cells may be lysed in a lysis solution (e.g., 10 mM Tris-HCl (pH 7.9), 50 mM EDTA (pH 7.9), 0.2 M NaCl, 2.2% SDS, 0.5 mg/ml ANTI-RNase (a protein ribonuclease inhibitor; AMBION®) and 1000 mg/ml proteinase K (AMBION®)), for example, at about 55° C. for about 3 hours with shaking (e.g., vigorous shaking). In some other embodiments, the plurality of cells may be lysed using ultrasonication and/or by being passed through an 18-25 gauge syringe needle at least once. In yet some other embodiments, the plurality of cells may be lysed by being heated to about 70-90° C. For example, the plurality of cells may be lysed by being heated to about 70-90° C. for about one or more hours. The cDNAs may then be isolated from the lysed cells. In some embodiments, RNase H may be added to the cDNA to remove RNA. The methods may further comprise ligating at least two of the nucleic acid tags that are bound to the released cDNAs. In some other embodiments, the methods of labeling nucleic acids in the first cell may comprise ligating at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, etc. of the nucleic acid tags that are bound to the cDNAs.

In various embodiments, the methods of labeling nucleic acids in the first cell may comprise removing one or more unbound nucleic acid tags (e.g., washing the plurality of cells). For example, the methods may comprise removing a portion, a majority, or substantially all of the unbound nucleic acid tags. Unbound nucleic acid tags may be removed such that further rounds of the disclosed methods are not contaminated with one or more unbound nucleic acid tags from a previous round of a given method. In some embodiments, unbound nucleic acid tags may be removed via centrifugation. For example, the plurality of cells can be centrifuged such that a pellet of cells is formed at the bottom of a centrifuge tube. The supernatant (i.e., liquid containing the unbound nucleic acid tags) can be removed from the centrifuged cells. The cells may then be resuspended in a buffer (e.g., a fresh buffer that is free or substantially free of unbound nucleic acid tags). In another example, the plurality of cells may be coupled or linked to magnetic beads that are coated with an antibody that is configured to bind the cell membrane. The plurality of cells can then be pelleted using a magnet to draw them to one side of the reaction vessel. In some other embodiments, the plurality of cells may be placed in a cell strainer (e.g., a PLURISTRAINER® cell strainer) and washed with a wash buffer. For example, the plurality of cells may remain in the cell strainer while the wash buffer passes through the cell strainer. Wash buffer may include a surfactant, a detergent, and/or about 5-60% formamide.

As discussed above, the plurality of cells can be repooled and the method can be repeated any number of times, adding more tags to the cDNAs creating a set of nucleic acid tags that can act as a barcode. As more and more rounds are added, the number of paths that a cell can take increases and consequently the number of possible barcodes that can be created also increases. Given enough rounds and divisions, the number of possible barcodes will be much higher than the number of cells, resulting in each cell likely having a unique barcode. For example, if the division took place in a 96-well plate, after 4 divisions there would be 964=84,934,656 possible barcodes.

In some embodiments, the reverse transcription primer may be configured to reverse transcribe all, or substantially all, RNA in a cell (e.g., a random hexamer with a 5′ overhang). In some other embodiments, the reverse transcription primer may be configured to reverse transcribe RNA having a poly(A) tail (e.g., a poly(dT) primer, such as a dT(15) primer, with a 5′ overhang). In yet some other embodiments, the reverse transcription primer may be configured to reverse transcribe predetermined RNAs (e.g., a transcript-specific primer). For example, the reverse transcription primer may be configured to barcode specific transcripts such that fewer transcripts may be profiled per cell, but such that each of the transcripts may be profiled over a greater number of cells.

FIG. 2 illustrates the formation of cDNA by in situ reverse transcription. Panel A depicts a cell that is fixed and permeabilized. Panel B depicts addition of a poly(T) primer, as discussed above, which can template the reverse transcription of polyadenylated transcripts. Panel C depicts addition of a random hexamer, as discussed above, which can template the reverse transcription of substantially any transcript. Panel D depicts the addition of a primer that is designed to target a specific transcript, as discussed above, such that only a subset of transcripts may be amplified. Panel E depicts the cell of Panel A after reverse transcription, illustrating a cDNA hybridized to an RNA.

Reverse transcription may be conducted or performed on the plurality of cells. In certain embodiments, reverse transcription may be conducted on a fixed and/or permeabilized plurality of cells. In some embodiments, M-MuLV reverse transcriptase (ENZYMATICS™) may be used in the reverse transcription. Any suitable method of reverse transcription is within the scope of this disclosure. For example, a reverse transcription mix may include a reverse transcription primer including a 5′ overhang and the reverse transcription primer may be configured to initiate reverse transcription and/or to act as a binding sequence for nucleic acid tags. In some other embodiments, a portion of a reverse transcription primer that is configured to bind to RNA and/or initiate reverse transcription may comprise one or more of the following: a random hexamer, a septamer, an octomer, a nonamer, a decamer, a poly(T) stretch of nucleotides, and/or one or more gene specific primers.

Another aspect of the disclosure relates to methods of uniquely labeling molecules within a cell or within a plurality of cells. In some embodiments, the method may comprise: (a) binding an adapter sequence, or universal adapter, to molecules within the plurality of cells; (b) dividing the plurality of cells into at least two primary aliquots, wherein the at least two primary aliquots comprise at least a first primary aliquot and a second primary aliquot; (c) providing primary nucleic acid tags to the at least two primary aliquots, wherein the primary nucleic acid tags provided to the first primary aliquot are different from the primary nucleic acid tags provided to the second primary aliquot; (d) binding the adapter sequences within each of the at least two primary aliquots with the provided primary nucleic acid tags; (e) combining the at least two primary aliquots; (f) dividing the combined primary aliquots into at least two secondary aliquots, the at least two secondary aliquots comprising at least a first secondary aliquot and a second secondary aliquot; (g) providing secondary nucleic acid tags to the at least two secondary aliquots, wherein the secondary nucleic acid tags provided to the first secondary aliquot are different from the secondary nucleic acid tags provided to the second secondary aliquot; and (h) binding the molecules within each of the at least two secondary aliquots with the provided secondary nucleic acid tags.

In certain embodiments, the method may further comprise step (i), i.e., repeating steps (e), (f), (g), and (h) with subsequent aliquots. Step (i) can be repeated a number of times sufficient to generate a unique series of nucleic acid tags for the molecules in a single cell. In various embodiments, the number of times may be selected from 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, etc. In certain other embodiments, step (i) may be repeated another suitable number of times.

In some embodiments, the molecules may be disposed within the cell or within the plurality of cells. In some other embodiments, the molecules may be coupled to the cell or to the plurality of cells. For example, the molecules may be cell-surface molecules. In yet some other embodiments, the molecules may be disposed within and/or coupled to the cell or the plurality of cells.

As discussed above, the method may comprise fixing and/or permeabilizing the plurality of cells prior to step (a). In various embodiments, each of the nucleic acid tags may comprise a first strand. The first strand may comprise a barcode sequence including a 3′ end and a 5′ end. The first strand may further comprise a 3′ hybridization sequence and a 5′ hybridization sequence flanking the 3′ end and the 5′ end of the barcode sequence, respectively. In some embodiments, each of the nucleic acid tags may comprise a second strand. The second strand may comprise a first portion complementary to at least one of the 5′ hybridization sequence and the adapter sequence and a second portion complementary to the 3′ hybridization sequence.

In certain embodiments, the molecules are macromolecules. In various embodiments, the molecules are selected from at least one of RNA, cDNA, DNA, protein, peptides, and/or antigens.

In some embodiments, the molecules are RNA and the adapter sequence may be single-stranded. Furthermore, step (a) may comprise one of ligating a 5′ end of the single-stranded adapter sequence to a 3′ end of the RNA and/or ligating a 3′ end of the single-stranded adapter sequence to a 5′ end of the RNA. In some other embodiments, the molecules are RNA and step (a) may comprise hybridizing the adapter sequence to the RNA.

Methods related to binding or coupling an adapter sequence to an RNA can be used, for example, in RNA transcriptome sequencing, ribosome profiling, small RNA sequencing, non-coding RNA sequencing, and/or RNA structure profiling. In some embodiments, the plurality of cells may be fixed and/or permeabilized. The 5′ end of a single-stranded adapter sequence may be ligated to the 3′ end of an RNA (see FIGS. 3A and 3B). In certain embodiments, the ligation may be conducted or performed by T4 RNA Ligase 1. In certain other embodiments, the ligation may be conducted by T4 RNA Ligase 1 with a single-stranded adapter sequence including a 5′ phosphate. In various embodiments, the ligation may be conducted by THERMOSTABLE 5′ APPDNA/RNA LIGASE™ (NEW ENGLAND BIOLABS®). In various other embodiments, the ligation may be conducted by THERMOSTABLE 5′ APPDNA/RNA LIGASE™ with a 5′ pre-adenylated single-stranded adapter sequence. Other suitable ligases and adapter sequences are also within the scope of this disclosure.

In some embodiments, the RNA can be labeled with adapter sequence using hybridization, for example, via Watson-Crick base-pairing (see FIG. 4 ). After the labeling steps and/or cell lysis, as discussed above, the adapter sequence may be configured to prime reverse transcription to form or generate cDNA (see FIG. 5 ).

The 3′ end of a single-stranded adapter sequence may be ligated to the 5′ end of an RNA. In certain embodiments, the ligation may be conducted or performed by T4 RNA Ligase 1. In certain other embodiments, the ligation may be conducted by T4 RNA Ligase 1 with an RNA including a 5′ phosphate. In various embodiments, the ligation may be conducted by THERMOSTABLE 5′ APPDNA/RNA LIGASE™ (NEW ENGLAND BIOLABS®). In various other embodiments, the ligation may be conducted by THERMOSTABLE 5′ APPDNA/RNA LIGASE™ with a 5′ pre-adenylated RNA. As stated above, other suitable ligases and adapter sequences are also within the scope of this disclosure.

In some embodiments, the molecules may be cDNA. Methods related to binding or coupling an adapter sequence to a cDNA can be used, for example, in RNA transcriptome sequencing. In certain embodiments, the plurality of cells may be fixed and/or permeabilized. Reverse transcription may be performed on the plurality of fixed and/or permeabilized cells with a primer that includes the adapter sequence on the 5′ end. As discussed above, the 3′ end of the primer may be gene-specific, a random hexamer, or a poly(T) sequence. The resulting cDNA may include the adapter sequence on its 5′ end (see FIG. 5 ).

In some embodiments, wherein the molecules are DNA (e.g., genomic DNA), the method may further comprise digesting the DNA with a restriction enzyme prior to step (a). Furthermore, step (a) may comprise ligating the adapter sequence to the digested DNA.

Methods related to binding or coupling an adapter sequence to a DNA may be used, for example, in whole genome sequencing, targeted genome sequencing, DNase-Seq, ChIP-sequencing, and/or ATAC-seq. In certain embodiments, one or more restriction enzymes may be used to digest DNA into at least one of blunt end fragments and/or fragments having overhang sequences. A partial double-stranded sequence with the single-stranded universal adapter or adapter sequence protruding on one end can be ligated to the digested genomic DNA. For example, a partial double-stranded sequence with the single-stranded adapter sequence having an overhang, wherein the overhang is compatible with the overhang generated by the one or more restriction enzymes, may be ligated to the digested genomic DNA.

In various embodiments, adapter sequences can be integrated (e.g., directly integrated) into genomic DNA using Tn5 transposase and the transposase can be released to expose the adapter sequences by addition of sodium dodecyl sulfate (SDS). Other transposases and methods of integrating the adapter sequences into genomic DNA are also within the scope of this disclosure.

In certain embodiments, the molecules are protein, peptide, and/or antigen, and the adapter sequence may be bound to a unique identifier sequence (e.g., comprising nucleic acids) that is coupled to an antibody. The unique identifier sequence may be configured to uniquely identify the antibody to which the unique identifier sequence is bound. Furthermore, step (a) may comprise binding the antibodies, which comprise each of the adapter sequence and the unique identifier sequence, to the protein, peptide, and/or antigen. In certain other embodiments, the molecules are protein, peptide, and/or antigen, and the adapter sequence may be integrated in an aptamer. Furthermore, step (a) may comprise binding the aptamer to the protein, peptide, and/or antigen.

Methods related to binding or coupling an adapter sequence to a protein, a peptide, and/or an antigen may be used, for example, in protein quantification, peptide quantification, and/or antigen quantification. In various embodiments, the adapter sequence can be attached (e.g., chemically attached) to an antibody. For example, the adapter sequence can be attached to an antibody using chemistry known to the skilled artisan for mediating DNA-protein bonds. Antibodies for different proteins can be labeled with nucleic acid sequences or strands that include a unique identifier sequence in addition to the adapter sequence. The antibody, or set of antibodies, may then be used in an immunostaining experiment to label a protein, or set of proteins, in fixed and/or permeabilized cells or tissue (see FIG. 6 ). Subsequently, the cells may undergo a labeling or barcoding procedure as disclosed herein.

In some embodiments, the nucleic acid sequences (e.g., the DNA molecules) attached or bound to the antibodies can be released from the antibodies and/or adapter sequences. A sequencing reaction can reveal a unique identifier sequence associated with a given protein as well as the label or barcode associated with a unique cell or cells. In certain embodiments, such a method may reveal or identify the number and/or type of proteins present in one or more cells.

In various embodiments, a DNA aptamer and/or an RNA aptamer can be used instead of, or in addition to, a nucleic acid-modified (or DNA-modified) antibody as described above (see FIG. 7 ). The adapter sequence (and target protein-specific antibody) may be integrated (e.g., directly integrated) into the sequence of a given aptamer.

Another aspect of the disclosure relates to methods of barcoding nucleic acids within a cell. In some embodiments, the methods of barcoding nucleic acids within a cell may comprise: (a) generating cDNAs within a plurality of cells by reverse transcribing RNAs using a reverse transcription primer comprising a 5′ overhang sequence; (b) dividing the plurality of cells into at least two aliquots; (c) providing a plurality of nucleic acid tags to each of the at least two aliquots, wherein each barcode sequence of the plurality of nucleic acid tags introduced into a given aliquot is the same, and wherein a different barcode sequence is introduced into each aliquot; (d) binding at least one of the cDNAs in each of the at least two aliquots to the nucleic acid tags; (e) combining the at least two aliquots; and (f) repeating steps (b), (c), (d), and (e) at least once with the combined aliquot.

In certain embodiments, each nucleic acid tag may comprise a first strand comprising a 3′ hybridization sequence extending from a 3′ end of a barcode sequence and a 5′ hybridization sequence extending from a 5′ end of the barcode sequence. Each nucleic acid tag may also comprise a second strand comprising an overhang sequence, wherein the overhang sequence comprises (i) a first portion complementary to at least one of the 5′ hybridization sequence and the 5′ overhang sequence and (ii) a second portion complementary to the 3′ hybridization sequence.

FIG. 8 depicts dividing, tagging, and pooling of cells, according to an embodiment of the present disclosure. Cells that have been reverse transcribed can be divided between reaction vessels or wells. In FIG. 8 , 4 wells are shown. As discussed above, however, any suitable number of reaction vessels or wells may be used. One cell is highlighted to show its path through the process. As depicted, the highlighted cell first ends up in well ‘a’, wherein it is the 1^(st) tag added to it that hybridizes to the overhang of all the cDNA transcripts (shown in the box). The tag carries a unique barcode region ‘a’, identifying the well that the cell was in. After hybridization, all cells are washed to remove excess tags, regrouped, and then split again between the same number of wells. The highlighted cell then ends up in well ‘c’ and has a 2^(nd) tag added to it identifying the well it was in. After the second round, the cells could have taken 4²=16 possible paths through the tubes. The process can be repeated, adding more tags to the cDNA transcripts and increasing the number of possible paths the cells can take. FIGS. 9A and 9B depict two exemplary workflows, according to embodiments of the present disclosure.

Another aspect of the disclosure relates to kits for labeling nucleic acids within at least a first cell. In some embodiments, the kit may comprise at least one reverse transcription primer comprising a 5′ overhang sequence. The kit may also comprise a plurality of first nucleic acid tags. Each first nucleic acid tag may comprise a first strand. The first strand may include a 3′ hybridization sequence extending from a 3′ end of a first labeling sequence and a 5′ hybridization sequence extending from a 5′ end of the first labeling sequence. Each first nucleic acid tag may further comprise a second strand. The second strand may include an overhang sequence, wherein the overhang sequence may comprise (i) a first portion complementary to at least one of the 5′ hybridization sequence and the 5′ overhang sequence of the reverse transcription primer and (ii) a second portion complementary to the 3′ hybridization sequence.

The kit may further comprise a plurality of second nucleic acid tags. Each second nucleic acid tag may comprise a first strand. The first strand may include a 3′ hybridization sequence extending from a 3′ end of a second labeling sequence and a 5′ hybridization sequence extending from a 5′ end of the second labeling sequence. Each second nucleic acid tag may further comprise a second strand. The second strand may comprise an overhang sequence, wherein the overhang sequence may comprise (i) a first portion complementary to at least one of the 5′ hybridization sequence and the 5′ overhang sequence of the reverse transcription primer and (ii) a second portion complementary to the 3′ hybridization sequence. In some embodiments, the first labeling sequence may be different from the second labeling sequence.

In some embodiments, the kit may also comprise one or more additional pluralities of nucleic acid tags. Each nucleic acid tag of the one or more additional pluralities of nucleic acid tags may comprise a first strand. The first strand may include a 3′ hybridization sequence extending from a 3′ end of a labeling sequence and a 5′ hybridization sequence extending from a 5′ end of the labeling sequence. Each nucleic acid tag of the one or more additional pluralities of nucleic acid tags may also comprise a second strand. The second strand may include an overhang sequence, wherein the overhang sequence comprises (i) a first portion complementary to at least one of the 5′ hybridization sequence and the 5′ overhang sequence of the reverse transcription primer and (ii) a second portion complementary to the 3′ hybridization sequence. In some embodiments, the labeling sequence may be different in each given additional plurality of nucleic acid tags.

In various embodiments, the kit may further comprise at least one of a reverse transcriptase, a fixation agent, a permeabilization agent, a ligation agent, and/or a lysis agent.

Another aspect of the disclosure relates to kits for labeling molecules within at least a first cell. For example, the kits as disclosed above may be adapted to label one or more of RNA, cDNA, DNA, protein, peptides, or antigens within at least a first cell.

As will be understood by one of ordinary skill in the art, each embodiment disclosed herein can comprise, consist essentially of, or consist of its particular stated element, step, ingredient, or component. As used herein, the transition term “comprise” or “comprises” means includes, but is not limited to, and allows for the inclusion of unspecified elements, steps, ingredients, or components, even in major amounts. The transitional phrase “consisting of” excludes any element, step, ingredient or component not specified. The transition phrase “consisting essentially of” limits the scope of the embodiment to the specified elements, steps, ingredients or components, and to those that do not materially affect the embodiment.

Unless otherwise indicated, all numbers expressing quantities of ingredients, properties such as molecular weight, reaction conditions, and so forth used in the specification and claims are to be understood as being modified in all instances by the term “about.” Accordingly, unless indicated to the contrary, the numerical parameters set forth in the specification and attached claims are approximations that may vary depending upon the desired properties sought to be obtained by the present disclosure. At the very least, and not as an attempt to limit the application of the doctrine of equivalents to the scope of the claims, each numerical parameter should at least be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. When further clarity is required, the term “about” has the meaning reasonably ascribed to it by a person skilled in the art when used in conjunction with a stated numerical value or range, i.e., denoting somewhat more or somewhat less than the stated value or range, to within a range of ±20% of the stated value; ±19% of the stated value; ±18% of the stated value; ±17% of the stated value; ±16% of the stated value; ±15% of the stated value; ±14% of the stated value; ±13% of the stated value; ±12% of the stated value; ±11% of the stated value; ±10% of the stated value; ±9% of the stated value; ±8% of the stated value; ±7% of the stated value; ±6% of the stated value; ±5% of the stated value; ±4% of the stated value; ±3% of the stated value; ±2% of the stated value; or ±1% of the stated value.

Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the disclosure are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. Any numerical value, however, inherently contains certain errors necessarily resulting from the standard deviation found in their respective testing measurements.

The terms “a,” “an,” “the” and similar referents used in the context of describing the disclosure (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. Recitation of ranges of values herein is merely intended to serve as a shorthand method of referring individually to each separate value falling within the range. Unless otherwise indicated herein, each individual value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein is intended merely to better illuminate the disclosure and does not pose a limitation on the scope of the disclosure otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the disclosure.

Groupings of alternative elements or embodiments of the disclosure disclosed herein are not to be construed as limitations. Each group member may be referred to and claimed individually or in any combination with other members of the group or other elements found herein. It is anticipated that one or more members of a group may be included in, or deleted from, a group for reasons of convenience and/or patentability. When any such inclusion or deletion occurs, the specification is deemed to contain the group as modified thus fulfilling the written description of all Markush groups used in the appended claims.

Definitions and explanations used in the present disclosure are meant and intended to be controlling in any future construction unless clearly and unambiguously modified in the following examples or when application of the meaning renders any construction meaningless or essentially meaningless in cases where the construction of the term would render it meaningless or essentially meaningless, the definition should be taken from Webster's Dictionary, 3rd Edition or a dictionary known to those of ordinary skill in the art, such as the Oxford Dictionary of Biochemistry and Molecular Biology (Ed. Anthony Smith, Oxford University Press, Oxford, 2004).

EXAMPLES

The following examples are illustrative of disclosed methods and compositions. In light of this disclosure, those of skill in the art will recognize that variations of these examples and other examples of the disclosed methods and compositions would be possible without undue experimentation.

Example 1—Fixation and Reverse Transcription

NIH/3T3 (mouse) and Hela-S3 (human) cells can be grown to confluence on two separate 10 cm cell culture plates. The cells can be rinsed twice with 10 ml 1× phosphate buffered saline (PBS), 1 ml of 0.05% trypsin can be added to each plate, and the plates can be incubated at 37° C. for 5 minutes. The cells can be detached by tilting each plate at a 45° angle while pipetting trypsin across the plates, which can be continued until all, or substantially all, of the cells are detached. Each cell line can be transferred into its own 15 ml conical centrifuge tube (FALCON™). 2 ml of Dulbecco's Modified Eagle Medium (DMEM) with 10% fetal bovine serum (FBS) can be added to each tube. The number of cells in each tube can be calculated (e.g., with a hemocytometer or on a flow cytometer). For example, 200 μl of the sample can be transferred from each tube into separate 1.7 ml microcentrifuge tubes (EPPENDORF®) and 100 μl of the sample can be run on an ACCURI™ Flow Cytometer to calculate the cell concentration.

The same number of cells from each tube can be combined into a new single 15 ml conical centrifuge tube (FALCON™), using as many cells as possible. A 5 minute spin can be conducted at 500×g in a 15 ml conical centrifuge tube (FALCON™). It may be helpful to use a bucket centrifuge so that the cells are pelleted at the bottom of the tube rather than on the side of the tube. The liquid can be aspirated without disturbing the cell pellet and the cells can be resuspended in 500 μl of 4% formaldehyde. The cells can then be left at room temperature (i.e., 20-25° C.) for 10 minutes. 1.5 ml of 0.5% TRITON™ X-100 can be added to the tube and mixed gently with a pipette. The tube can them be spun at 500×g for 5 minutes. Again, the liquid can be aspirated without disturbing the pellet and the pellet can be washed twice with 1 ml PBS without resuspending the pellet. If washing disturbs the pellet, the second wash can be skipped. The pellet can then be resuspended in 1 ml 0.1 N HCl and incubated at room temperature for 5 minutes.

2 ml of Tris-HCl (pH 8.0) can be added to a new 15 ml conical centrifuge tube (FALCON™). The fixed cells in HCl, from above, can be transferred to the tube with Tris-HCl so as to neutralize the HCl. The number of cells in the tube can then be calculated as discussed above (e.g., with a hemocytometer or on a flow cytometer). The fixed cells in Tris-HCl can be spun down at 500×g for 5 minutes and the liquid can be aspirated without disturbing the pellet. The pellet can be washed twice with 1 ml RNase-free molecular grade water, without disturbing the pellet. The cells can then be resuspended to a concentration of 2.5 million cells/ml (to do this, the concentration calculated before the last spin step can be used).

A reverse transcription mix can be made (55 μl M-MuLV reverse transcriptase buffer (ENZYMATICS™), 55 μl M-MuLV reverse transcriptase (ENZYMATICS™), 5.5 μl dNTPs (25 mM per base), 3.44 μl RNase inhibitor (ENZYMATICS™, 40 units/μl), 210.4 μl nuclease-free water, and 2.75 μl RT Primer (BC_0055, 100 μM)). In a well of a 24-well cell culture plate, 300 μl of the reverse transcription mix can be combined with 200 μl of the fixed cells (˜500,000 cells) and mixed gently by pipetting. The mixture can then be incubated at room temperature for 10 minutes to allow the reverse transcription primer to anneal and the mixture can then be incubated at 37° C. in a humidified incubator overnight (i.e., ˜16 hours).

A primer that can be used for reverse transcription (BC_0055) is depicted in FIG. 10 . This is an anchored primer, designed to bind the start of a poly(A) tail of a messenger RNA. The primer may be synthesized with all 4 bases at the 3′ end (N) and every base except T at the second-most 3′ position (V). The primer can also include 15 consecutive dTs. In some embodiments, the primer may include more than 15 dTs. In some other embodiments, the primer may include fewer than 15 dTs. In embodiments wherein the primer includes fewer than 15 dTs, the melting temperature of the primer may be lowered. The domain s0 may not hybridize to messenger RNAs, but may instead provide an accessible binding domain for a linker oligo. The primer also includes a 5′ phosphate that can allow ligation of the primer to another oligo by T4 DNA ligase.

Example 2—Preparation of Barcodes

The barcodes were ordered in 96-well plates at 100 μM concentrations. Each barcode was annealed with its corresponding linker oligo (see FIGS. 10-12 ).

FIG. 11 depicts an annealed, first-round barcode oligo. 96 first-round barcode oligos with unique sequences in domain i8a were used. In the first round, the unique sequence in domain i8a is the region of the sequence that is used as a barcode. By varying 8 nucleotides, there are 65,536 possible unique sequences. In some embodiments, more than 8 nucleotides may be present in domain i8a. In some other embodiments, fewer than 8 nucleotides may be present in domain i8a. The first-round barcodes were preannealed to a linker strand (BC_0056) through complementary sequences in domain s1. The linker strand can include complementary sequence to part of the reverse transcription primer (domain s0) that can allow it to hybridize and bring the 3′ end of the first-round barcodes in close proximity to the 5′ end of the reverse transcription primer. The phosphate of the reverse transcription primer can then be ligated to the 3′ end of the first-round barcodes by T4 DNA ligase. The domain s2 can provide an accessible binding domain for a linker oligo to be used in another round of barcoding. The first-round barcode oligos can include a 5′ phosphate that can allow ligation to the 3′ end of another oligo by T4 DNA ligase.

FIG. 12 depicts an annealed, second-round barcode oligo. 96 second-round barcode oligos with unique sequences in domain i8b were used. In the second round, the unique sequence in domain i8b is the region of the sequence that is used as a barcode. By varying 8 nucleotides, there are 65,536 possible unique sequences. In some embodiments, more than 8 nucleotides may be present in domain i8b. In some embodiments, less than 8 nucleotides may be present in domain i8b. The second-round barcodes can be preannealed to a linker strand (BC_0058) through complementary sequences in domain s3. The linker strand can include complementary sequence to part of the first-round barcode oligo (domain s2) that can allow it to hybridize and bring the 3′ end of the first-round barcodes in close proximity to the 5′ end of the second-round barcode oligo. The phosphate of the first-round barcode oligo can then be ligated to the 3′ end of the second-round barcodes by T4 DNA ligase. The domain s4 can provide an accessible binding domain for a linker oligo to be used in another round of barcoding. The second-round barcode oligos can include a 5′ phosphate that can allow ligation to the 3′ end of another oligo by T4 DNA ligase.

FIG. 13 depicts an annealed, third-round barcode oligo. 96 third-round barcode oligos with unique sequences in domain i8c were used. In the third round, the unique sequence in domain i8c is the region of the sequence that is used as a barcode. By varying 8 nucleotides, there are 65,536 possible unique sequences. In some embodiments, more than 8 nucleotides may be present in domain i8c. In some other embodiments, less than 8 nucleotides may be present in domain i8c. The third round of barcodes can be preannealed to a linker strand (BC_0060) through complementary sequences in domain s5. The linker strand can include complementary sequence to part of the second-round barcode oligo (domain s4) that can allow it to hybridize and bring the 3′ end of the second-round barcodes in close proximity to the 5′ end of the third-round barcode oligo. The phosphate of the second-round barcode oligo can then be ligated to the 3′ end of the third-round barcodes by T4 DNA ligase. The third-round barcode oligos can be synthesized with unique molecular identifiers (UMI; see Islam, et. al. Nature Methods, 2014) consisting of 10 random nucleotides (domain UMI: NNNNNNNNNN). Due to PCR amplification bias, multiple sequencing reads can originate from the cDNA. Using a UMI, each cDNA may be counted only once. The third-round barcodes can also include a domain corresponding to part of the ILLUMINA® TruSeq adapter. The third-round barcodes can be synthesized with a biotin molecule at the 5′ end so that fully barcoded cDNA can be isolated with streptavidin coated magnetic beads.

Starting from a 100 μM stock of each barcode oligo (i.e., in 96-well plates, one for each round), 11 μl of barcode oligo were transferred to 96-well PCR plates. To the plate with the round 1 barcodes, 9 μl of BC_0056 (100 μM stock) were added to each well. To the plate with the round 2 barcodes, 9 μl of BC_0058 (100 μM stock) were added to each well. To the plate with the round 3 barcodes, 9 μl of BC_0060 (100 μM stock) were added to each well. Each plate was then placed in a thermocycler, with the following program, to anneal the barcodes with the corresponding linker oligo: heat to 90° C., reduce heat 0.1° C./second, and stop when the temperature reaches 25° C. 2.2 μl were transferred from each well having the round 1 barcodes into a new 96-well plate (referred to as plate L1). 3.8 μl were transferred from each well with the round 2 barcodes into a new 96-well plate (referred to as plate L2). 6.1 μl were transferred from each well with the round 3 barcodes into a new 96-well plate (referred to as plate L3).

Example 3—Preparation of Ligation Stop Oligos

After each round of ligation, the ligation can be stopped by adding an excess of oligo that is complementary to the linker strands (see FIG. 14 ). To stop each barcode ligation, oligo strands that are fully complementary to the linker oligos can be added. These oligos can bind the linker strands attached to unligated barcodes and displace the unligated barcodes through a strand displacement reaction. The unligated barcodes can then be completely single-stranded. As T4 DNA ligase is unable to ligate single-stranded DNA to other single-stranded DNA, the ligation reaction will stop progressing. To ensure that all linker oligos are bound by the complementary oligos, a molar excess of the complementary oligos (relative to the linker oligos) is added. To stop the first-round ligation, BC_0064 (complementary to BC_0056) is added. To stop the second-round ligation, BC_0065 (complementary to BC_0058) is added. To stop the third-round ligation, BC_0066 (complementary to BC_0060) is added.

Dilutions can be prepared for each stop ligation strand (BC_0064, BC_0065, BC_0066) as follows: 264 μl stop ligation strand (BC_0064, BC_0065, BC_0066), 300 μl 10×T4 DNA Ligase Buffer, and 636 μl nuclease-free water.

Example 4—Ligation of Barcodes to cDNA

5 μl 10% TRITON™ X-100 can be added to the reverse transcription reaction (to a final concentration of 0.1%) in the above-described 24-well plate. The reverse transcription (RT) reaction with cells can be transferred to a 15 ml conical centrifuge tube (FALCON™). The RT reactions can be spun for 10 minutes at 500×g and resuspended in 2 ml nuclease-free water. The cells can be combined with ligase mix (600 μl 10×T4 ligase buffer, 2040 μl of nuclease-free water, all of the resuspended cells (2000 μl), 100 μl of T4 DNA Ligase (NEW ENGLAND BIOLABS®, 400,000 units/ml), and 60 μl of 10% TRITON™ X-100) in a disposable pipetting reservoir (10 ml)). The cells and ligase mix can be mixed by gently tilting the reservoir back and forth several times. Using a multichannel pipette, 40 μl of the cells in the ligase mix can be added to each well of annealed round 1 barcodes (plate L1). Each well can be mixed by pipetting up and down gently 2-3 times. The cells in the ligase mix can be incubated at 37° C. for 60 minutes.

10 μl of the diluted BC_0064 can be added to each well to stop the ligation. The samples can then be incubated at 37° C. for 30 minutes. All of the cells can be collected in a new disposable pipetting reservoir (10 ml). The cells can be passed through a 40 μM strainer into a new disposable pipetting reservoir (10 ml) using a 1 ml pipette. 100 μl of T4 DNA ligase (NEW ENGLAND BIOLABS®, 400,000 units/ml) can be added to the cells in reservoir. The cells and ligase mix can be mixed by gently tilting the reservoir back and forth several times and using a multichannel pipette, 40 μl of the cells in the ligase mix can be added to each well of annealed round 2 barcodes (plate L2). Each well can be mixed by pipetting up and down gently 2-3 times and the samples can then be incubated at 37° C. for 60 minutes.

10 μl of the diluted BC_0065 can be added to each well to stop the ligation. The samples can be incubated at 37° C. for 30 minutes and the cells can then be collected in a new disposable pipetting reservoir (10 ml). The cells can be passed through a 40 μM strainer into a new disposable pipetting reservoir (10 ml) using a 1 ml pipette. 100 μl of T4 DNA ligase (NEW ENGLAND BIOLABS®, 400,000 units/ml) can be added to the cells in the reservoir. The cells and ligase mix can be mixed by gently tilting the reservoir back and forth several times. Using a multichannel pipette, 40 μl of the cells in the ligase mix can be added to each well of annealed round 3 barcodes (plate L3). Each well can then be mixed by pipetting up and down gently 2-3 times and the samples can be incubated at 37° C. for 60 minutes.

10 μl of the diluted BC_0066 can be added to each well to stop the ligation. The samples can be incubated at 37° C. for 30 minutes. All the cells can be collected in a new disposable pipetting reservoir (10 ml). The cells can be transferred to a 15 ml conical centrifuge tube (FALCON™) and the tube can be filled with wash buffer (nuclease-free water, 0.05% Tween 20, and 25% formamide) to 15 ml. The samples can be incubated for 15 minutes at room temperature. The cells can then be pelleted at 500×g for 10 minutes and the liquid can be removed without disturbing the pellet. Each tube of cells can be resuspended in 100 μl PBS and the cells can be counted (e.g., on a hemocytometer or on a flow cytometer). In one example, 57,000 cells were retained. The number of cells to be sequenced can be chosen. In one example, the cells were split into 25 cell, 250 cell, 2,500 cell, and 25,000 cell aliquots. 300 μl of lysis buffer (10 mM NaF, 1 mM Na₃VO₄, 0.5% DOC buffer, and 0.5% TRITON™ X-100) can be added to each of the cell aliquots and each of the cell aliquots can be passed through a 25 gauge needle eight times.

Example 5—Binding Barcoded cDNA to Streptavidin Coated Beads

First, DYNABEADS® MYONE™ Streptavidin Cl beads can be resuspended. 20 μl of resuspended DYNABEADS® MYONE™ Streptavidin Cl beads (for each aliquot of cells) can be added to a 1.7 ml microcentrifuge tube (EPPENDORF®). The beads can be washed 3 times with 1×phosphate buffered saline Tween 20 (PBST) and resuspended in 20 μl PBST. 900 μl PBST can be added to the cell aliquot and 20 μl of washed Cl beads can be added to the aliquot of lysed cells. The samples can be placed on a gentle roller for 15 minutes at room temperature and then washed 3 times with 800 μl PBST using a magnetic tube rack (EPPENDORF®). The beads can then be resuspended in 100 μl PBS.

Example 6—RNase Treatment of Beads

A microcentrifuge tube (EPPENDORF®) comprising a sample can be placed against a magnetic tube rack (EPPENDORF®) for 2 minutes and then the liquid can be aspirated. The beads can be resuspended in an RNase reaction (3 μl RNase Mix (ROCHE™), 1 μl RNase H (NEW ENGLAND BIOLABS®), 5 μl RNase H 10×Buffer (NEW ENGLAND BIOLABS®), and 41 μl nuclease-free water). The sample can be incubated at 37° C. for 1 hour, removed from 37° C., and placed against a magnetic tube rack (EPPENDORF®) for 2 minutes. The sample can be washed with 750 μl of nuclease-free water+0.01% Tween 20 (H₂O-T), without resuspending the beads and keeping the tube disposed against the magnetic tube rack. The liquid can then be aspirated. The sample can be washed with 750 μl H₂O-T without resuspending the beads and while keeping the tube disposed against the magnetic tube rack. Next, the liquid can be aspirated while keeping the tube disposed against the magnetic tube rack. The tube can then be removed from the magnetic tube rack and the sample can be resuspended in 40 μl of nuclease-free water.

Example 7-3′ Adapter Ligation

With reference to FIG. 15 , to facilitate PCR amplification, a single-stranded DNA adapter oligo (BC_0047) can be ligated to the 3′ end of cDNA. To prevent concatemers of the adapter oligo, dideoxycytidine (ddC) can be included at the 3′ end of the adapter oligo. BC_0047 was generated with a phosphate at the 5′ end and ddC at the 3′ end. Several enzymes are capable of ligating single-stranded oligo to the 3′ end of single-stranded DNA. Herein, T4 RNA ligase 1 (NEW ENGLAND BIOLABS®) was used. Thermostable 5′ AppDNA/RNA Ligase (NEW ENGLAND BIOLABS®) can also be used with a preadenylated adaptor oligo.

Specifically, 20 μl of the RNase-treated beads can be added to a single PCR tube. 80 μl of ligase mix (5 μl T4 RNA Ligase 1 (NEW ENGLAND BIOLABS®), 10 μl 10×T4 RNA ligase buffer, 5 μl BC_0047 oligo at 50 μM, 50 μl 50% PEG 8000, and 10 μl 10 mM ATP) can be added to the 20 μl of beads in the PCR tube. 50 μl of the ligase mixed with the beads can be transferred into a new PCR tube to prevent too many beads from settling to the bottom of a single tube and the sample can be incubated at 25° C. for 16 hours.

Example 8—Generating ILLUMINA® Compatible Sequencing Products

Ligation reactions from both PCR tubes can be combined into a single 1.7 ml microcentrifuge tube (EPPENDORF®). 750 μl of H₂O-T can be added to each sample. Each of the tubes can be placed on a magnetic tube rack (EPPENDORF®) for 2 minutes, the liquid can be aspirated, and the samples can be resuspended in 40 μl water. The samples can be transferred to PCR tubes. 60 μl of PCR mix can be added to each tube (50 μl 2×PHUSION® DNA Polymerase Master Mix (THERMO FISHER™ Scientific), 5 μl BC_0051 (10 μM), and 5 μl BC_0062 (10 μM)). 10 cycles of PCR can be run (98° C. for 3 minutes, repeat 10 times (98° C. for 10 seconds, 65° C. for 15 seconds, and 72° C. for 60 seconds), and 72° C. for 5 minutes). FIG. 16 depicts the PCR product. After the 3′ adapter oligo (BC_0047) has been ligated to barcoded cDNA, the cDNA can be amplified using PCR. As shown in FIG. 16 , the primers BC_0051 and BC_0062 were used.

The PCR samples from the previous step can be procured and the magnetic beads can be displaced to the bottom of each tube with a magnet. 90 μl of PCR reaction can be transferred to a new 1.7 ml without transferring any of the magnetic beads. 10 μl of nuclease-free water can be added to each of the 1.7 ml tubes to a total volume of 100 μl. 60 μl of AMPURE™ beads can be added to the 100 μl of PCR reaction (0.6×SPRI) and bound for 5 minutes. The tubes can be placed against a magnet for 2 minutes and the samples can be washed with 200 μl of 70% ethanol (30 second wait) without resupending the beads. The samples can be washed again with 200 μl of 70% ethanol (30 second wait) without resuspending the beads and then the samples can be air dried for 5-10 minutes until the ethanol has evaporated.

Each of the samples can be resuspended in 40 μl of nuclease-free water. The tubes can be placed against a magnetic rack for 2 minutes. While the microcentrifuge tubes (EPPENDORF®) are still disposed against the magnetic rack, 38 μl of solution can be transferred to a new 1.7 ml tube, without transferring beads. 62 μl of nuclease-free water can be added to the samples to a total volume of 100 μl. 60 μl of AMPURE™ beads can then be added to 100 μl of the PCR reaction (0.6×SPRI) and bound for 5 minutes. The tubes can be placed against a magnet for 2 minutes and then the samples can be washed with 200 μl of 70% ethanol (30 second wait) without resupending the beads. The samples can be washed again with 200 μl 70% ethanol (30 second wait) without resupending the beads and then the samples can be air dried for 5-10 minutes until the ethanol has evaporated.

The samples can be resuspended in 40 μl of nuclease-free water and each tube can be placed against a magnetic rack for 2 minutes. While the tube is still disposed against the magnetic rack, 38 μl of solution to a new 1.7 ml tube, without transferring any beads. 20 μl of the 38 μl elution can be added to an optical PCR tube. Furthermore, a PCR mix can be added to the tube (25 μl PHUSION® DNA Polymerase Master Mix (THERMO FISHER™ Scientific), 2.5 μl BC_0027 (10 μM), 2.5 μl BC_0063 (10 μM), and 2.5 μl 20×EVAGREEN® (Biotium)). Following the PCR depicted in FIG. 16 , the full ILLUMINA® adapter sequences can be introduced through another round of PCR. As depicted in FIG. 17 , BC_0027 includes the flow cell binding sequence and the binding site for the TRUSEQ™ read 1 primer. BC_0063 includes the flow cell binding sequence and the TruSeq multiplex read 2 and index binding sequence. There is also a region for the sample index, which is GATCTG in this example.

The above samples can be run on a qPCR machine with the following cycling conditions: 1) 98° C. for 3 minutes, 2) 98° C. for 10 seconds, 3) 65° C. for 15 seconds, 4) 72° C. for 60 seconds, and 5) repeat steps 2-4 (e.g., 10-40 times, depending on when fluorescence stops increasing exponentially). The tube can be transferred to a thermocycler set to 72° C. for 5 minutes. The qPCR reaction can be run on a 1.5% agarose gel for 40 minutes and a 450-550 bp band can be removed and gel extracted (QIAQUICK® Gel Extraction Kit). The products can be sequenced on an ILLUMINA® MISEQ™ using paired end sequencing. The sequencing primers can be the standard TRUSEQ™ multiplex primers. Read 1 can sequence the cDNA sequence, while read 2 can cover the unique molecular identifier as well as the 3 barcode sequences (8 nucleotides each). Index read 1 can be used to sequence sample barcodes, so multiple samples may be sequenced together.

Example 9—Data Analysis

Sequencing reads were grouped by cell barcodes (three barcodes of eight nucleotides each, 96×96×96=884,736 total combinations). Each barcode combination should correspond to the cDNA from a single cell. Only reads with valid barcodes were retained. The sequencing reads with each barcode combination were aligned to both the human genome and the mouse genome. Reads aligning to both genomes were discarded. Multiple reads with the same unique molecular identifier were counted as a single read. Reads with unique molecular identifiers with two or less mismatches were assumed to be generated by sequencing errors and were counted as a single read. For each unique barcode combination the number of reads aligning to the human genome (x-axis) and the mouse genome (y-axis) were plotted (see FIG. 18 ). As each cell is either mouse or human, it should ideally include only one type of RNA. So an ideal plot would have every point along the x- or y-axis. The fact that most points in the plot of FIG. 18 are near an axis indicates that the method is viable.

Each point in the plot corresponds to cDNA with the same combination of barcodes and should represent the cDNA from a single cell. For each point, the number of reads that map uniquely to the mouse genome are plotted on the y-axis, while the number of reads that map uniquely to the human genome are plotted on the x-axis. If cDNAs with a specific combination of barcodes came from a single cell, all of the cDNA with the specific combination of barcodes should map completely to the human genome or completely to the mouse genome. As stated above, the fact that most barcode combinations map close to either the x-axis (human cells) or the y-axis (mouse cells) indicates that the method can indeed produce single-cell RNA sequencing data.

Certain embodiments of this disclosure are described herein, including the best mode known to the inventors for carrying out the disclosure. Of course, variations on these described embodiments will become apparent to those of ordinary skill in the art upon reading the foregoing description. The applicants expect skilled artisans to employ such variations as appropriate, and the applicants intend for the various embodiments of the disclosure to be practiced otherwise than specifically described herein. Accordingly, this disclosure includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.

Furthermore, numerous references have been made to patents and printed publications throughout this specification. Each of the above-cited references and printed publications are individually incorporated herein by reference in their entirety.

It is to be understood that the embodiments of the present disclosure are illustrative of the principles of the present disclosure. Other modifications that may be employed are within the scope of the disclosure. Thus, by way of example, but not of limitation, alternative configurations of the present disclosure may be utilized in accordance with the teachings herein. Accordingly, the present disclosure is not limited to that precisely as shown and described.

The particulars shown herein are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present disclosure only and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of various embodiments of the disclosure.

It will be apparent to those having skill in the art that many changes may be made to the details of the above-described embodiments without departing from the underlying principles of the disclosure. The scope of the present invention should, therefore, be determined only by the following claims. 

1. A method of cell-specifically labeling RNA molecules within cells, the method comprising: (a) fixing and permeabilizing a plurality of cells; (b) generating complementary DNA (cDNA) molecules within the plurality of cells by reverse transcribing RNA molecules within the cells, wherein the RNA molecules are reverse transcribed using reverse transcription (RT) primers each comprising: a poly(T) sequence or a random nucleotide sequence, and a 5′ overhang comprising a 5′ overhang sequence; and (c) tagging the cDNA molecules with one or more barcode sequences by performing steps (i) through (iii) one or more times: (i) dividing the plurality of cells comprising the cDNA molecules into a plurality of aliquots, wherein each aliquot comprises more than one cell; (ii) coupling nucleic acid tags to cDNA molecules within cells of the aliquots, wherein each nucleic acid tag comprises a barcode sequence, and wherein the barcode sequences present within the nucleic acid tags are specific to each aliquot; and (iii) combining the cells from the plurality of aliquots.
 2. The method of claim 1, further comprising: (d) dividing the combined cells from the plurality of aliquots into a plurality of samples; (e) lysing the cells in the plurality of samples to release the cDNA molecules; and (f) amplifying the released cDNA molecules in the plurality of samples using one or more amplification primers, wherein at least one of the amplification primers used in each sample comprises an index sequence, and wherein the index sequences present within the amplification primers are specific to each sample.
 3. The method of claim 1, wherein steps (i) through (iii) are repeated 1, 2, 3, 4, or 5 times.
 4. The method of claim 1, wherein steps (i) through (iii) are repeated a number of times sufficient to generate at least as many unique barcode sequence combinations as the total number of cells in the plurality of cells.
 5. The method of claim 1, wherein steps (i) through (iii) are repeated a number of times sufficient to provide a greater than 50%, 90%, 95%, or 99% probability that the cDNA molecules in each cell are bound to a unique barcode sequence combination.
 6. The method of claim 1, wherein the nucleic acid tags each comprise a 3′ region and/or a 5′ region flanking the barcode sequence.
 7. The method of claim 1, wherein the coupling comprises ligating 3′ ends of the nucleic acid tags to 5′ ends of the cDNA molecules.
 8. The method of claim 7, wherein the 3′ ends of the nucleic acid tags are brought into proximity of the 5′ ends of the cDNA molecules by prehybridizing the nucleic acid tags with linker nucleic acid strands that are each complementary to a 3′ terminal sequence of a nucleic acid tag and a 5′ terminal sequence of a cDNA molecule.
 9. The method of claim 8, wherein the 5′ terminal sequence of the cDNA molecule comprises a 5′ overhang sequence from an RT primer or a 5′ region from a previously coupled nucleic acid tag.
 10. The method of claim 8, wherein the coupling is stopped by introducing a plurality of ligation stop oligos that are each complementary to all or part of a linker nucleic acid strand.
 11. The method of claim 1, wherein the nucleic acid tags are DNA tags.
 12. The method of claim 1, wherein the nucleic acid tags that are coupled to the cDNA molecules during the last of the one or more times comprise a unique molecular identifier (UMI), a capture agent, a flow-cell binding site, and/or a primer-binding site.
 13. The method of claim 13, wherein the capture agent comprises biotin.
 14. The method of claim 1, wherein the barcode sequences each comprise at least 8 nucleotides.
 15. The method of claim 1, wherein the plurality of cells is selected from the group consisting of mammalian cells, yeast cells, bacterial cells, and combinations thereof.
 16. The method of claim 1, wherein all of the RT primers comprise a poly(T) sequence.
 17. The method of claim 1, wherein all of the RT primers comprise a random nucleotide sequence.
 18. The method of claim 2, further comprising: (g) sequencing the cDNA molecules amplified in step (f).
 19. The method of claim 18, further comprising: (h) grouping the sequencing reads obtained in step (g) by barcode sequence and/or index sequence.
 20. The method of claim 19, wherein the sequencing reads are grouped by a combination of barcode sequence and index sequence.
 21. The method of claim 20, wherein grouping by barcode sequence comprises grouping by barcode sequence combination.
 22. The method of claim 1, wherein the plurality of aliquots comprises 96 aliquots distributed in a 96-well plate. 